Aggression and Complexity in Trump’s 2020 Presidential Campaign Rhetoric
1 Introduction
Research Question: Is there a correlation between aggressivity and rhetorical complexity in Donald Trump’s 2020 presidential campaign speeches?
Donald Trump’s rhetorical style is the defining feature of his political identity: confrontational and simplistic. With his recent return to the presidential office for a second term, his words have gained scrutiny across the globe. Understanding the impact of his rhetoric is a fundamental step in analyzing how politicians use public messagging to amass supporters, distract political opponents, and control their media appearance.
2 Data
This analysis uses Trump’s campaign speeches from the 2020 presidential election to assess if, or to what extent, there is a correlation between aggressivity and rhetorical simplicity. By looking at these two elements of speech side by side, we can gain a better understanding if more combative speeches are simplified to reach larger audiences, and what kind of relationship exists between aggressivity and complexity in a textual context.
2.1 Dataset Collection
Chalkiadakis, Ioannis and Anglès d’Auriac, Louise and Peters, Gareth and Frau-Meigs, Divina, A text dataset of campaign speeches of the main tickets in the 2020 US presidential election (September 20, 2024).
- The dataset consists of 235 official transcripts of Donald Trump’s speeches throughout his 2020 presidential campaign from January, 2019 through January, 2021.
3 Aggressivity
Donald Trump is notorious for his aggressive speeches, tweets, and press statements, stirring debates every time he speaks. His strong populist, right-wing rhetoric has contributed greatly to an already polarized political system.
This section aims to analyze and interpret the average monthly aggression ratios for each month from January 2019 to January 2021, along with a 2-month rolling average for trend visualization.
3.1 Monthly Average Aggression Ratio
There is a clear spike in August 2019 where Trump was the most aggressive, due to his infamous attacks on his former advisor, the former F.B.I and C.I.A. directors, and more. There were other smaller spikes in the spring of 2019 and again in January 2020; the former correlates to his crackdown on various social welfare programs, such as the Affordable Care Act, and his transgender military ban, whereas the latter correlates to his expansion on the Muslim ban and strict policing on the southern border.
3.2 Analysis of Aggression Above the 75th Percentile
This section takes a closer examination at the trends of high aggression. To do so, I analyze a subset of speeches above the 75th percentile, with an aggression ratio above 0.206258. The subset consisted of 21 speeches out of the 235 total.
4 Rhetoric Complexity
Donald Trump’s simple linguistics are a key characteristic of his political persona as well: he speaks in simple sentences and phrases, without any deeper rhetorical meaning or complexity. By examining the level of complexity using flesch_score, we can assess if, or to what extent, a relationship exists between his more aggressivity and complexity of speech.
4.1 Monthly Average Flesch Score
Trump’s linguistic complexity also saw spikes over the course of his campaign; the first several months, he received higher ‘flesch_score’ which means a higher ease of reading. His rhetoric became more complex from December-May of 2025, with an outlier spike in February. The increased complexity in Trump’s speeches correlates with the beginning of the Covid-10 pandemic and his counterterrorism operation in Yemen, both of which prompted more sophisticated speeches.
4.2 Analysis of Flesch Score Above the 75th Percentile
This section takes a closer examination at the trends of low complexity. I do so by looking at speeches with a flesch_score above 68.72, where Trump’s speeches were the simplest. The subset for the 75th percentile consists of 59 speeches, of 235 total.
5 Topic Modeling: Aggression in the 75th percentile
Now that that the time trend of aggression has been established, this section will explore what topics resulted in the highest levels of aggression during Trump’s 2020 presidential campaign speeches. To do so, I perform Linear Discriminant Analysis (LDA) to identify the top topics in the 75th percentile of aggressive speeches.
LatentDirichletAllocation(n_components=7, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LatentDirichletAllocation(n_components=7, random_state=42)
Topic #1: know said peopl want dont say thing great think right
Topic #2: woman nation iran futur countri american terror state continu busi
Topic #3: race state unit sex order nation american feder act agenc
Topic #4: countri american border year biden peopl nation america presid want
Topic #5: iran world unit nation state american china peopl year hong
Topic #6: thank american america nation great peopl state unit child histori
Topic #7: divis holocaust appoint th unit woman act secretari crime day
5.1 WordClouds Representing Top Topics: AGGRESSION
6 Topic Modeling: Simplicity in the 75th Percentile
This section will perform the same analysis on the speeches with a flesch_score above the 75th percentile, using the same Linear Discriminant Analysis (LDA) to identify the top topics.
LatentDirichletAllocation(n_components=6, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LatentDirichletAllocation(n_components=6, random_state=42)
def display_topics(model, feature_names, num_top_words):
for idx, topic in enumerate(model.components_):
print(f"Topic #{idx + 1}: ", " ".join([feature_names[i] for i in topic.argsort()[:-num_top_words - 1:-1]]))
display_topics(lda2, vectorizer.get_feature_names_out(), 10)Topic #1: peopl think good lot thing great number want countri meet
Topic #2: thank peopl great know said want think countri like say
Topic #3: know said peopl dont want year laughter say right great
Topic #4: crowd number great happen mani come know im weve thank
Topic #5: peac heart brother robert wonder memori tonight live best forev
Topic #6: presid trump said know want dont peopl year say biden
6.1 WordClouds Representing Top Topics: SIMPLICITY
7 Conclusion
In conclusion, a definitive correlation between Trump’s aggressivity and rhetoric complexity cannot be determined. However, a moderate inverse relationship can be identified between aggressivity and flesch_score reading ease: the more aggressive a speech is, the more complex its language tended to be. When applied to political context, the spikes and unusual outliers are explained by changes in policies or current events, such as the Covid-19 pandemic.
The analysis also presented highlighted an interesting trend in how Trump campaigned in the 2020 presidential election, and which topics were the most prevalent in aggressively-charged speeches.Topics that were linked to higher levels of aggression focused on justice and order, fake news, China, and immigration. Topics that were linked to linguistic simplicity included his brother’s passing, patriotism, and policy.
7.1 Future Research
To further investigate the relationship between rhetoric complexity and aggressivity in the future, I recommend using a larger dataset that incorporates a more diverse selection of documents. For further research on Trump, this should include tweets, statements made on social media, and transcriptions of video clips. Trump has been known to make inflammatory remarks about political opponents on social media, and this would be a more precise avenue to pursue a deeper analysis.
8 Appendix
This section shows the code I used to perform my analysis.
Monthly Average Aggression Ratio
american_words = [
"abuse", "abysmal", "accusation", "accusations", "accuse", "accusing", "adversarial",
"aggressive", "anger", "angered", "annoyance", "annoyed", "annoying", "antagonistic",
"antagonize", "appalling", "archaic", "arrogance", "arrogant", "ashamed", "assault",
"assaulted", "assaulting", "attacking", "atrocious", "backtalk", "bitter", "bitterly",
"bitterness", "blackened", "blackmail", "blame", "blamed", "blaming", "blunder", "bogus",
"botch", "botched", "betray", "betrayed", "betrayal", "clownery", "chaos", "chaotic",
"complain", "complaining", "condemn", "confront", "confrontation", "confrontational",
"crass", "coward", "cowardly", "criticize", "criticized", "criticizing", "cruel", "cruelty",
"debase", "debased", "deceit", "deceived", "deceive", "deception", "devious", "deviousness",
"despicable", "disgrace", "disgraceful", "disgusting", "dishonest", "dishonorable",
"disregard", "disreputable", "distasteful", "dodgy", "dull", "embarrass", "embarrassing",
"embarrassment", "fabricator", "fail", "failed", "failure", "failures", "faithless", "farcical",
"fiasco", "fibber", "fiddle", "fiddled", "fool", "foolish", "fraud", "fraudulence",
"fraudulent", "furious", "gimmick", "good-for-nothing", "groan", "grotesque", "hackery",
"half-truths", "hate", "hatred", "hodgepodge", "horrendous", "hostile", "hostility",
"humiliate", "humiliating", "hypocrisy", "hypocrite", "idiot", "idiotic", "ignorance",
"ignorant", "ill-judged", "ill-mannered", "immoral", "inadequacy", "incapable", "inferior",
"insult", "insulted", "insulting", "intolerant", "ironic", "irony", "irritated", "jumble",
"laughable", "lawbreakers", "leech", "libelous", "ludicrous", "mess", "misbehave", "mischief",
"mischievous", "mislead", "misleading", "needless", "needlessly", "neglect", "neglected",
"neglectful", "negligent", "nonsense", "nonsensical", "nasty", "obnoxious", "offend",
"offenders", "outrageous", "outraged", "patronize", "patronizing", "petty", "penny-pinching",
"phony", "petulant", "prejudice", "prejudices", "predictable", "problematic", "provoke",
"provoked", "ridicule", "ridiculous", "reprehensible", "rude", "scandal", "scandalous",
"scapegoat", "scapegoats", "scaremonger", "scaremongering", "shady", "shameful", "shambles",
"sham", "shenanigans", "short-sighted", "silly", "silliness", "slander", "slanderous",
"sleaze", "sleazy", "sly", "slyness", "smokescreen", "sneaky", "spite", "spiteful", "steal",
"stereotyping", "stubborn", "stupid", "stupidity", "subterfuge", "swindling", "tactic",
"talking back", "trick", "trickery", "unacceptable", "unhelpful", "unnatural", "untrue",
"undermine", "outrageous", "vindictive", "villain", "woeful", "wrong"
]import pandas as pd
import json
# Path to your file
file_path = '/Users/KaylaMuller/desktop/text_analysis/week12/cleantext_DonaldTrump.jsonl.txt'
# Read the file line by line and parse each line as JSON
data = []
with open(file_path, 'r', encoding='utf-8') as f:
for line in f:
data.append(json.loads(line))
# Turn into a DataFrame
Trumpdf = pd.DataFrame(data)import pandas as pd
import re
# Make sure your list of words is defined
word_list = set(american_words)
# Compile a regex pattern that matches any of the words, word-boundary safe
pattern = re.compile(r'\b(' + '|'.join(re.escape(word) for word in word_list) + r')\b', re.IGNORECASE)
# Apply a function to count matches in each row
Trumpdf["NegativeWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(pattern.findall(text)))
Trumpdf["TotalWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(re.findall(r'\b\w+\b', text)))
Trumpdf["neg_ratio"] = Trumpdf["NegativeWordCount"] / Trumpdf["TotalWordCount"] * 100
# Ensure the 'Date' column is in datetime format
Trumpdf["Date"] = pd.to_datetime(Trumpdf["Date"], errors="coerce")
# Drop rows where 'Date' is NaT (invalid dates)
Trumpdf = Trumpdf.dropna(subset=["Date"])
# Extract YearMonth in string format (YYYY-MM) for easier handling in ggplot
Trumpdf["YearMonth"] = Trumpdf["Date"].dt.to_period('M').astype(str)
# Calculate the average 'neg_ratio' by 'YearMonth'
monthly_avg_neg_ratio = Trumpdf.groupby("YearMonth")["neg_ratio"].mean().reset_index()
# Export the result to CSV for use in R
monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)library(reticulate)
library(ggplot2)
# Load the CSV file (make sure you have the correct path to the file)
df <- read.csv("monthly_avg_neg_ratio.csv")
# Convert 'YearMonth' to a date format
df$YearMonth <- as.Date(paste0(df$YearMonth, "-01"))
# Plot the data
ggplot(df, aes(x = YearMonth, y = neg_ratio)) +
geom_line() +
labs(title = "Monthly Average Aggression Ratio", x = "Month", y = "Aggression Ratio (%)") +
theme_minimal()# Sort by 'YearMonth' to ensure the rolling average works correctly
monthly_avg_neg_ratio = monthly_avg_neg_ratio.sort_values("YearMonth")
# Calculate the two-month rolling average of 'neg_ratio'
monthly_avg_neg_ratio["TwoMonthRollingAvg"] = monthly_avg_neg_ratio["neg_ratio"].rolling(window=2).mean()
# Export the result to CSV for use in R
monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio_with_rolling_avg.csv", index=False)library(ggplot2)
library(readr)
library(dplyr)
# Read the data
monthly_avg_neg_ratio <- read_csv("monthly_avg_neg_ratio_with_rolling_avg.csv")
# Convert YearMonth to Date type
monthly_avg_neg_ratio <- monthly_avg_neg_ratio %>%
mutate(Date = as.Date(paste0(YearMonth, "-01")))
# Plot with ggplot
ggplot(monthly_avg_neg_ratio, aes(x = Date)) +
geom_line(aes(y = neg_ratio), color = "blue", linetype = "dashed", size = 1) +
geom_line(aes(y = TwoMonthRollingAvg), color = "red", size = 1) +
labs(title = "Monthly Negative Ratio with Two-Month Rolling Average",
x = "Date",
y = "Negative Ratio (%)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")Analysis of Aggression in the 75th Percentile
# Subset the DataFrame to select only rows where 'neg_ratio' > 0.206258
subset_df = Trumpdf[Trumpdf["neg_ratio"] > 0.206258]
# Calculate the average 'neg_ratio' by 'YearMonth'
subset_monthly_avg_neg_ratio = subset_df.groupby("YearMonth")["neg_ratio"].mean().reset_index()
# Export the result to CSV for use in R
subset_monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)library(reticulate)
library(ggplot2)
# Load the CSV file (make sure you have the correct path to the file)
df_with_subset <- read.csv("monthly_avg_neg_ratio.csv")
# Convert 'YearMonth' to a date format
df_with_subset$YearMonth <- as.Date(paste0(df_with_subset$YearMonth, "-01"))
# Plot the data
ggplot(df_with_subset, aes(x = YearMonth, y = neg_ratio)) +
geom_line() +
labs(title = "Monthly Average Aggression Ratio for the 75th percentile", x = "Month", y = "Aggression Ratio (%)") +
theme_minimal()Monthly Average Flesch Score
from textstat import flesch_reading_ease
Trumpdf['flesch_score'] = Trumpdf['CleanText'].apply(flesch_reading_ease)
# Calculate the average 'flesch_score' by 'YearMonth'
monthly_avg_flesch_score = Trumpdf.groupby("YearMonth")["flesch_score"].mean()
# Export the result to CSV for use in R
monthly_avg_flesch_score.to_csv("monthly_avg_flesch_score.csv", index=True)library(ggplot2)
library(readr)
library(dplyr)
# Read the data
monthly_avg_flesch_score <- read_csv("monthly_avg_flesch_score.csv")
# Convert YearMonth to Date type
monthly_avg_flesch_score <- monthly_avg_flesch_score %>%
mutate(Date = as.Date(paste0(YearMonth, "-01")))
# Plot with ggplot
ggplot(monthly_avg_flesch_score, aes(x = Date)) +
geom_line(aes(y = flesch_score), color = "blue", linetype = "dashed", size = 1) +
geom_line(aes(y = flesch_score), color = "red", size = 1) +
labs(title = "Monthly Average Flesch Score",
x = "Date",
y = "Flesch Score") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")Analysis of Flesch Score Above the 75th Percentile
# Subset the DataFrame to select only rows where 'flesch_score' > 68.72
subset_df_flesch_score = Trumpdf[Trumpdf["flesch_score"] > 68.72]
# Calculate the average 'neg_ratio' by 'YearMonth'
subset_monthly_avg_flesch_score = subset_df_flesch_score.groupby("YearMonth")["flesch_score"].mean().reset_index()
# Export the result to CSV for use in R
subset_monthly_avg_flesch_score.to_csv("subset_monthly_avg_flesch_score.csv", index=True)library(ggplot2)
library(readr)
library(dplyr)
# Read the data
subset_monthly_avg_flesch_score <- read_csv("/Users/KaylaMuller/Desktop/text_analysis/week12/subset_monthly_avg_flesch_score.csv")
# Convert YearMonth to Date type
subset_monthly_avg_flesch_score <- subset_monthly_avg_flesch_score %>%
mutate(Date = as.Date(paste0(YearMonth, "-01")))
# Plot with ggplot
ggplot(subset_monthly_avg_flesch_score, aes(x = Date)) +
geom_line(aes(y = flesch_score), color = "blue", linetype = "dashed", size = 1) +
geom_line(aes(y = flesch_score), color = "red", size = 1) +
labs(title = "Monthly Average Flesch Score for the 75th Percentile",
x = "Date",
y = "Flesch Score") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")8.1 Topic Modeling: Aggression in the 75th Percentile
import string
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
# Step 0: Optional — Make a copy to avoid SettingWithCopyWarning
subset_df = subset_df.copy()
# Setup
stop = set(stopwords.words('english'))
stop.add('applause') # custom stopword
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()
# Combined cleaning function
def clean_text(text):
text = text.lower() # lowercase
text = text.translate(str.maketrans('', '', string.punctuation)) # remove punctuation
text = re.sub(r'\d+', '', text) # remove numbers
tokens = word_tokenize(text) # tokenize
tokens = [word for word in tokens if word not in stop] # remove stopwords
tokens = [lemmatizer.lemmatize(word) for word in tokens] # lemmatization
tokens = [stemmer.stem(word) for word in tokens] # stemming
return ' '.join(tokens)
# Apply to DataFrame
subset_df['CleanText_transformed'] = subset_df['CleanText'].apply(clean_text)# Vectorize
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(max_df=0.9, min_df=2, stop_words='english') # stop_words optional now
dtm = vectorizer.fit_transform(subset_df['CleanText_transformed'])from sklearn.decomposition import LatentDirichletAllocation
lda = LatentDirichletAllocation(n_components=7, random_state=42) # change 5 to desired topic count
lda.fit(dtm)def display_topics(model, feature_names, num_top_words):
for idx, topic in enumerate(model.components_):
print(f"Topic #{idx + 1}: ", " ".join([feature_names[i] for i in topic.argsort()[:-num_top_words - 1:-1]]))
display_topics(lda, vectorizer.get_feature_names_out(), 10)topic_results = lda.transform(dtm)
subset_df['DominantTopic'] = topic_results.argmax(axis=1)WordClouds Representing Top Topics: AGGRESSION
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Get the feature names (words)
feature_names = vectorizer.get_feature_names_out()
# Loop over each topic
for topic_idx, topic_weights in enumerate(lda.components_):
# Create dictionary: word -> weight
word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]} # top 30 words
# Generate the word cloud
wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)
# Plot the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.title(f"Topic #{topic_idx + 1}")
plt.show()Topic Modeling: Simplicity in the 75th Percentile
import string
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
# Step 0: Optional — Make a copy to avoid SettingWithCopyWarning
subset_df_flesch_score = subset_df_flesch_score.copy()
# Setup
stop = set(stopwords.words('english'))
stop.add('applause') # custom stopword
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()
# Combined cleaning function
def clean_text(text):
text = text.lower() # lowercase
text = text.translate(str.maketrans('', '', string.punctuation)) # remove punctuation
text = re.sub(r'\d+', '', text) # remove numbers
tokens = word_tokenize(text) # tokenize
tokens = [word for word in tokens if word not in stop] # remove stopwords
tokens = [lemmatizer.lemmatize(word) for word in tokens] # lemmatization
tokens = [stemmer.stem(word) for word in tokens] # stemming
return ' '.join(tokens)
# Apply to DataFrame
subset_df_flesch_score['CleanText_transformed'] = subset_df_flesch_score['CleanText'].apply(clean_text)# Vectorize
from sklearn.feature_extraction.text import CountVectorizer
vectorizer2 = CountVectorizer(max_df=0.9, min_df=2, stop_words='english') # stop_words optional now
dtm2 = vectorizer.fit_transform(subset_df_flesch_score['CleanText_transformed'])from sklearn.decomposition import LatentDirichletAllocation
lda2 = LatentDirichletAllocation(n_components=6, random_state=42) # change 5 to desired topic count
lda2.fit(dtm2)def display_topics(model, feature_names, num_top_words):
for idx, topic in enumerate(model.components_):
print(f"Topic #{idx + 1}: ", " ".join([feature_names[i] for i in topic.argsort()[:-num_top_words - 1:-1]]))
display_topics(lda2, vectorizer.get_feature_names_out(), 10)topic_results2 = lda2.transform(dtm2)
subset_df_flesch_score['DominantTopic'] = topic_results.argmax(axis=1)WordClouds Representing Top Topics: SIMPLICITY
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Get the feature names (words)
feature_names = vectorizer.get_feature_names_out()
# Loop over each topic
for topic_idx, topic_weights in enumerate(lda2.components_):
# Create dictionary: word -> weight
word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]} # top 30 words
# Generate the word cloud
wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)
# Plot the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.title(f"Topic #{topic_idx + 1}")
plt.show()